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1 

SYNTHETIC HIV PROTEASE GENE 

AND METHOD FOR ITS EXPRESSION 

BACKGROUND OF THE INVENTION 

Field of the Invention 

5 This invention relates to synthetic genes and 

their expression products. Specifically, this 
invention relates to a synthetic protease gene and its 
expression product. 
Description of the Related Art 

10 The presence of protease protein in purified 

virion preparation was shown only by immunological 
techniques. The HIV protease sequence together with 
the gag and pol sequence or fusion proteins have been 
expressed from viral DNA in bacteria. Examples of such 

15 disclosures include: 1. Henderson, et al., 1988, 
"Human Retroviruses, Cancer and AIDS: Approaches to 
Prevention and Therapy", D. Boloanesi Ed. Published by 
Alan R. Liss Inc., New York, NY. pp. 135-147; 
2. Debouck, et al . , 1987, P.N. A.S. . 84.: 8903-8906 , and 

20 3. Mous, et al., 1988, J. Virol , £2:1433-143 6. 

The primary sequences of the HIV protease has 
been determined by protein analysis and by the 
nucleotide sequence of the proviral DNA. It was thus 
determined that the protease is a 99 amino acid long 

25 protein encoded by a 297bp long stretch of the HIV 
provirus. All previous experiments on the protease 
gene and on its expression were carried out by 
utilizing nucleotide sequences cloned out from the cDNA 
of the provirus. The inventors* work using synthetic 

3 0 DNA proves that the nucleotide sequence of the provirus 
DNA and also the deduced aminoacid sequence are 
correct. 

The complete nucleotide sequence of the HIV-l 
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proviral DNA was published by Ratner et al . , 1985, 
Nature, 113:277-284. The sequence coding for the 
protease in the pol open reading frame of HIV was 
determined by previous analysis and corresponds to 
5 nucleotide 1609 to 1906 The N terminus and the C 
terminal amino-acids are proline and phenylalanine 
respectively. This sequence coding for the HIV-I 99 
aminoacid protease is 297bp long as follows. 
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30 


40 


50 
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CCTCAGATCA 


CTCTTTGGCA 


ACGACCCCTC 


GTCACAATAA 


AGATAGGGGG 




GGAGTCTAGT 


GAGAAACCGT 


TGCTGGGGAG 


CAGTGTTATT 


TCTATCCCCC 




60 


70 


80 


90 


100 




GCAACTAAAG 


GAAGCTCTAT 


TAGATACAGG 


AGCAGATGAT 


ACAGTATTAG 




CGTTGATTTC 


CTTCGAGATA 


ATCTATGTCC 


TCGTCTACTA 


TGTCATAATC 


15 


110 


120 


130 


140 


150 




AAGAAATGAG 


TTTGCCAGGA 


AGATGGAAAC 


CAAAAATGAT 


AGGGGGAATT 




TTCTTTACTC 


AAACGGTCCT 


TCTACCTTTG 


GTTTTTACTA 


TCCCCCTTAA 
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170 
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200 




GGAGGTTTTA 


TCAAAGTAAG 


ACAGTATGAT 


CAGATACTCA 


TAGAAATCTG 


20 


CCTCCAAAAT 


AGTTTCATTC 


TGTCATACTA 


GTCTATGAGT 


ATCTTTAGAC 
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220 


230 


240 


250 




TGGACATAAA 


GCTATAGGTA 


CAGTATTAGT 


AGGACCTACA 


CCTGTCAACA 




ACCTGTATTT 


CGATATCCAT 


GTCATAATCA 


TCCTGGATGT 


GGACAGTTGT 




260 


270 


280 


290 




25 


TAATTGGAAG 


AAATCTGTTG 


ACTCAGATTG 


GTTGCACTTT 


AAATTTT 




ATTAACCTTC 


TTTAGACAAC 


TGAGTCTAAC 


CAACGTGAAA 


TTTAAAA 



The industry is lacking a synthetic DNA 
sequence that encodes a specific enzyme or protease 
which is essential for the completion replication) of 
30 an infective human immunodeficiency virus (HIV)-. This 
DNA sequence is desirable to express this protease by 
recombinant methodology in bacteria and or in 



3 

eukaryotic cells, and to produce enough protease for 
biochemical and physical characterization in order to 
design and produce potent inhibitors of this enzyme, 
and thereby to block the production of infective HIV 
5 particles. 

BRIEF DESCRI PTION OF THE INVENTION 

The invention is a gene for encoding a 
protease of human immunodeficiency virus. The gene 
consists essentially of a synthetic nucleotide sequence 
10 for a protease essential to infectivity of human 
immunodeficiency virus. 

The protease is desirably a protease of HIV-i 
or HIV-2 that is essential for the infectivity of these 
viruses . 

15 The preferred embodiment of this inventions 

is a synthetic gene and the coding sequence for 
expression of the HIV-l protease is represented above 
by the top rows of nucleotide sequence. 
BRIEF DESCRIP TION OF THE DRAWING 

20 Figure 1 presents the expressed HIV protease 

as analyzed in Western blot. 

Figure 2 illustrates a strategy for the 
synthesis of the HIV-l protease gene. The 3' overhangs 
are in lower case. The complementary strands (not 

25 shown) were provided with 3- overhangs to match the 
coding strands. 

Figure 3 illustrates the induction of the 
gene at various periods of time. 

Figure 4 illustrates the activity of the 
30 expressed protease using a synthetic peptide asa 
substrate. 

DESCRIPTION OF THE P REFERRED EMBODIMENT 

The invention is a synthetic DNA sequence for 
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encoding a specific enzyme or protease. The protease 
is essential for the infectivity of the human 
immunodeficiency virus (HIV) . The invented gene is 
desirable for the expression of the protease by 
5 recombinant methodology in bacteria and or in 

eukaryotic cells and the production of a commercially 
desirable amount of the protease for biochemical and 
physical characterization. This characterization is 
necessary for the design and production of potent 

10 inhibitors of this enzyme. The invention also includes 
synthesis and expression of the protease gene of other 
retroviruses such as HIV-2, the human leukemia viruses 
such as HTLV I, II, and other human and animal RNA 
containing viruses causing leukemia sarcoma and other 

15 malignencies. 

The nucleotide sequence for the preferred 
embodiment of this invention was obtained from a 
published paper by Ratner, et al., supra . The sequence 
in the pol open reading frame coding for the protease 

20 of HIV-l corresponds to nucleotide 1609 to 1906. The 
N-terminal and the C-terminal amino-acids are proline 
and phenylalanine respectively. This sequence coding 
for the 99 aminoacid protease is 297bp long as shown 
above. Minor substitutions of one or more bases in this 

25 and other genes useful in this invention can produce a 
variant gene capable of expressing the desired 
protease. 

This sequence was synthesized as five 
fragments using the DNA synthesizer. Complementary 
30 strands corresponding to these five fragments were also 
synthesized. The 3 1 overhangs of four bases were 
provided for appropriate sequences to efficiently 
ligate each of the five fragments and to provide the 
correct coding sequence of the protease gene. 
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Nucleotide ATG were added to the fragment corresponding 
to the 5' end of the gene and TAA at the 3» end. 

A procaryotic expression vector was used to 
clone and then to express the synthetic sequence coding 
for the protease. The expression can be in prokaryotes 
(bacteria) or in other appropriate expression systems. 
Recombinant clones screened by colony hybridization 
using a labelled fragment (62bp) spanning the internal 
region of the protease gene. Positive clones were 
further analyzed for the size of the insert. Clones 
which answered positive were induced for expression and 
analyzed in Western blots to determine the protein 
product using specific antibodies. Figure 1 gives an 
example . 

Of the clones screened so far, 3 clones have 
been identified to express a product of li.skd, 
reacting against specific antibodies as illustrated in 
Figure 1. 

Conditions for the induction of a protease 
2 0 gene were studied in E. coli and optimized. The 

inventors have shown that the gene product has specific 
protease activity, as it is capable of cleaving both 
synthetic and natural substrates. The enzyme has been 
purified by specific column chromatographic techniques, 
including affinity chromotography . The method of this 
invention can produce enough active protease to study 
the structure of the protease, its mechanisms of 
action, with a goal of devising specific inhibitors to 
this enzyme, of a therapeutic application for the 
treatment of the diseases, such as AIDS , caused by the 
viruses, other embodiments of this invention can 
utilize a gene to express another protease such as the 
following gene for the HIV-2 protease. 

CCTCAATTCTCTCTTTGGAAAAGACCAGTAGTCACAGCATACATTGAGGGTCAGCCA 



25 



30 
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GTAGAAGTCTTGTTAGACACAGGGGCTGACGACTCAATAGTAGCAGGAATAGAGTTA 

GGGAACAATTATAGCCCAAAAATAGTAGGGGGAATAGGGGGATTCATAAATACCAAG 

GAATATAAAAATGTAGAAATAGAAGTTCTAAATAAAAAGGTACGGGCCACCATAATG 

ACAGGCGACACCCAATCAACATTTTTGGCAGAAATATTCTGACAGCCTTAGGCATGT 
5 CATTAAATCTAC 

Figure 1 demonstrates the expression of the 
HIV protease in E. coU* Cells transformed with the 
synthetic sequence of HIV protease in an appropriate 
expression vector were induced and the bacterial lysate 
10 was electrophoresed in SDS-PAGE. After transfer of 

proteins into a nitrocellulose membrane, immunoblotting 
procedure was performed using the specific antibody to 
the HIV protease. Detection of Ag-Ab complex was made 
using I 125 protein A. The autoradiograph lane A 
15 represents E. coU transformed with the plasmid, and 
lanes B and c E« cqU transformed with the plasmid 
bearing synthetic DNA encoding the HIV protease. On the 
left are protein molecular weight markers in 
kilodalton. The 11.5 kd band is the protease. 

The synthetic DNA of the invention also 
obviates any need to manipulate (infectious) viral 
material and overcomes limitations in the quantities 
which can be obtained by other means. 
EXAMPLES 

The following materials and methods were used 
to perform the examples. 

PLASMID, BACTEBT&L STRATWfl , AND ^ Tr^T.c 

Plasmid PKK233-2, a procaryotic expression 
vector was purchased from Pharmacia. PKK23 3-2 was used 
to transform in a laq-q host, E. coli cell JM105 or 
RB791. The cells were selected in M9 minimal media 
containing lug/ml thiamine, prior to using them for 
transformation. All chemicals utilized in the 
synthesis of oligonucleotides were from Applied 



20 



25 



30 
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Biosystems Inc. T4 polynucleotide kinase, DNA ligase, 
and Klenow fragment of E. coli DNA polymerase I were 
obtained from New England Biolabs. Restriction 
endonucleases, PMSF and IPTG were from Boehringer 
5 Mannheim, Bethesda Research Laboratories and 
Promega respectively. 

DNA SYNTHESIS. PLASMID CONSTRUCTION AND SCREENING : 

DNA fragments were synthesized using a ABI 
DNA synthesizer (model 381A) . All synthetic fragments 
10 were purified by electrophoresis in a 12% 

polyacrylamide/8M urea sequencing gel. DNA was 
visualized by UV-shadowing and full-length fragments 
were eluted from the gel as known in the art. The 
full-length fragments were checked for their purity 
15 using standard techniques. 

Appropriate complementary fragments were 
mixed in equimolar concentrations, annealed, kinased 
and ligated as described elsewhere. The efficiency of 
ligation was monitored by polyacrylamide gel 

2 0 lectrophoresis. The linearized plasmid and the 

protease gene in appropriate concentrations were 
ligated and used for transformation of E. coli . JM105. 
Recombinant clones were screened by colony 
hybridization using a 62 bp fragment labelled by 
25 kinasing. Small scale isolation of plasmid DNA from 
the recombinant clones was performed by the boiling 
method and the size of the inserts was visualized by 
autoradiography after labelling the 3' recessed 
terminal using the Klenow fragment of E. coli DNA 

3 0 polymerase. 

ANTIBODIES TO T HE HIV PROTEASE 

The polyclonal antibodies were raised in 
rabbits against (i) a complete synthetic sequence of l 
to 99 aminoacids of the HIV-1 protease and (ii) a 
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tridecapeptide corresponding to the C-terminus of the 
protease. 

ANALYSIS OF THE EXPRESSED PROTFTMS 

£■ c °li cells bearing the appropriate plasmid 
5 construct were grown to log phase, induced, and lysed 
by sonication. Total cell extracts were analysed by 
NaDodS0 4 /PAGE and subjected to immunoblot analysis. 
ASSAY FOR THE ACTIVITY OF THE EXPRESSED PROTEASE 

Oligopeptides were synthesized in a Peptide 
10 Synthesizer (Applied Biosystems Model 430A) , according 
to the method previously published (Copeland and 
Oroszlan, 1981) . The cleavage products were analysed 
by RP-HPLC on a uBondapak c 18 column (Waters 
Associates) . Peak fractions were analysed for amino- 
15 acid composition using a Pico-Tag amino acid analyser 
(Waters Associates) . 
EXAMPLE 1 

This example represents the preferred 
embodiment. 

20 RESULTS : 

SYNTHESIS OF TH E FULL-LENGTH PROTFASP CT.WP . 

The nucleotide sequence of the protease gene 
was taken from Ratner et al. The sequence in the pol 
open reading frame for the protease gene starts at 

25 nucleotide 1609 and ends at 1906, for coding 99 

aminoacids. This sequence and its complement were 
synthesized as five individual fragments of 
approximately 60 bases as shown in Figure 2. The 3' 
overhangs of 4 bases (shown in lower case) were 

30 provided for the fragments to selectively ligate the 
appropriate fragments to form the correct coding 
sequence. Translational initiation codon ATG and 
termination codon TAA were provided at the appropriate 
ends of the protease gene. A sequence was added to 
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provide a protrusion at the 5' end of the gene, having 
a cohesive end compatible to the restriction enzyme 
site Ncol. The 5' protrustion at the 3' end of the 
gene was added to provide a Hind3 compatible end. The 
5 complementary strands (not shown) were provided with 3' 
overhangs to match the coding strands. 

EXPRESSION OF THE SYNTHETIC HIV-1 PROTEASE GENE IN E. 
COLI 

Three clones (PR-C, PR-H, and PR-J) bearing 
10 the correct coding sequence of 297bp in the expression 
vector PKK233-2 were analyzed for expression to select 
conditions for the optimal induction of the gene. 
Figure 3 shows examples of Western blot analysis of the 
gene product . 

15 Figure 3 illustrates expression of the 

synthetic protease gene in E . coli . Clone PR-C bearing 
the coding sequence to the protease was induced for 
expression. The proteins (75ug of bacterial extract) 
were electrophoresed in a NaDodSO A /PAGE transferred to 

2 0 nitrocellulose and subjected to immunoblot analysis 
using a mixture of the two protease specific rabbit 
polyclonal antibodies raised against (i) a complete 
synthetic sequence of 1-99 amino acids of the HIV-l 
protease and (ii) a tridecapeptide corresponding to the 

2 5 C terminus of the protease. Figure 3A shows the 

induction of the gene with 0.4mM IPTG at various 
periods of time. Figure 3B shows the induction for 3 0 
minutes. With increasing concentrations of inducer 
IPTG. 1-5 represent mM concentration of IPTG at 0.28, 

3 0 0.56, 1.12, 2.24 and 4.4 8 respectively. Figure 3C 

shows the analysis after 60 minutes of induction with 
lmM IPTG and lysing the cells in various buffers. Bl 
denotes lysis of cells in 50mM Tris-HCl at pH 7.0, 
150mM NaCl, lmM EDTA, lmM PMST, lmM DTT and 0.5 percent 
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NP-4 0. B2 is the same as Bl, but without KaCl and 
EDTA. B3 is in 50mM potassium phosphate at pH 6.0, lmM 
PMSF and lmM DOT. B4 is the same as B3 with a pH of 
6.5. Positions of protein molecular weight markers are 
5 inducated on the left in kilodaltons. 

fi- coli cells bearing plasmid PR-c were grown 
in Luria broth to an optical density of 0.4 A600nm, and 
then induced at various periods of time for expression 
from the trc promotor by adding IPTG (isopropyl-beta-D- 

10 thiog-alactopyranoside) at a concentration of 0.4mM as 
seen in Figure 3A. The cloned gene expressed a single, 
unfused protein band of 11.5kd. Expression was maximal 
after 3 0 minutes of induction. This level decreased to 
about 25 percent at 60 minutes. There was no 

15 detectable expression after 120 minutes of induction 
and at 0 minutes. This pattern of induction was 
similar in the other clones (PR-H and PR-J) that were 
analyzed (not shown) . 

The results of the induction for 30 minutes 

2 0 with varying concentrations of inducer are shown in 

Figure 3B. Induction with IPTG in the range of lmM to 
4mM resulted in maximum amount of expression. Similar 
data were obtained on clones PR-H and PR-J (not shown) . 
In order to select the conditions that 

25 efficiently solubilize the protease for enzymatic 

analysis, different buffer systems were used for the 
lysis of cells (clone PR-C) after optimal induction 
with lmM IPTG. It was observed that sonication in a 
buffer system of 50mM Tris-cl at pH 7.5, lmM DTT, lmM 

30 PMSF and 0.5% nonidet P-40 released 50 to 70 percent of 
the protease in the soluble fraction (Figure 3C) . This 
was estimated by Western blot analysis aliquots of 
soluble extract and insoluble pellet for the content of 
the expressed product. 
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DEMONST RATION OF SPECIFIC PROTEOLYTIC ACTIVIT Y 

Figure 4 illustrates the activity of the 
expressed protease using a synthetic peptide as a 
substrate. Protease assays were carried out with 
22.5ug of bacterial lysate at 37°C obtained from clone 
PR-C, induced (A,B,C), uninduced (D) , and control cells 
bearing just the plasmid PKK233-2 (data not shown) . 
The nonapeptide was used as a substrate in reaction 
buffer (0.25 M potassium phosphate), pH 7.0, 0.5 
percent (v/v) NP 4 0, 5 percent (v/v) glycerol, 5 mM 
Dithiotreit and 2 M NaCl. Aliquots of 25 ul each were 
taken at 0 hours (A) , 1 hour (B) 3 hours (C) and 6 
hours (D) analyzed by RP-HPLC. S denotes the substrate 
and PI and P2 , cleavage products 1 and 2 respectively. 

To assess the activity of the cloned HIV-l 
Protease a synthetic nonapeptide corresponding to the 
HIV-l pl7-p24 cleavage site (Henderson, et al. 1988) 
was used as a substrate (4E) . The substrate in 
reaction buffer was mixed with aliquots of various cell 
extracts (see description of Figure 4 above) and 
incubated at 37°C. Equal eliquots of incubation mixture 
were taken at various time points and analyzed by RP- 
HPLC. The substrate in the 0 hour sample eluted as a 
single peak as shown in Figure 4A. After incubation 
for 1 hour, two newly appearing peaks, products 
labelled PI and P2, can be seen, correlating with a 
significant decrease of the substrate peak. Subsequent 
amino acid analysis of the recovered peaks demonstrated 
that product 1 and product 2 corresponded to the 
expected cleavage products as shown in Table 1 proving 
a Tyr-Pro bond cleavage, which is the determined 
natural cleavage site. Extended incubation for 3 hours 
showed a further decrease of the substrate peak and 
substantial increase in the peak height of product l, 
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indicating progression of the hydrolysis of the Tyr- s 
Pro bond. However, the peak of product 1 seems to be 
smaller as expected since the absorbance of the 
tetrapeptide Pro-Ile-Val-Glu-NH 2 is substantially 
5 smaller than that of the pentapeptide having a free 

COOH-terminal tyrosine. An increase of product 1 and 2 
after 3 hours of incubation showed a corresponding 
decrease of the substrate peak. 

No cleavage products have been detected in 

10 reactions using extracts from uninduced cells, clone 
PR-C (Figure 4D) and of control cells (control plasmid 
PKK2 3 3-2; data not shown). There was no decrease in 
the substrate peak even after 6 hours or incubation 
(Figure 4D) indicating that the nonapeptide is 

15 resistent to degradation by bacterial proteases. This 
makes this substrate especially useful for assaying 
viral protease activities in crude extracts, 
facilitating purification and isolation of the 
protease. 

20 The amino acid composition data for the 

substrate and its cleavage products are shown in Table 
1. The amounts of observed amino acids correspond 
clearly to the expected amounts demonstrating that the 
cleavage occurs at the expected cleavage site of the 

25 synthetic peptide corresponding to the pl7-p24 site of 
the gag precursor* 
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IN THE CLAIMS 

1- A gene for encoding a protease of human 
immunodeficiency virus consisting essentially of: 

a synthetic nucleotide sequence for a 
protease essential to infectivity of human 
immunodeficiency virus. 

2. The gene of claim 1 wherein said protease 
is essential for infectivity of a retrovirus. 

3. The gene of claim 2 wherein said 
retrovirus is a member of the group consisting of HIV* 
1, HIV-2, and HTLV a Human Leukemia virus. 

4. A gene for encoding a protease of human 
immunodeficiency virus consisting essentially of: 

a synthetic double stranded nucleotide 
sequence of which the coding sequence is: 



10 


20 


30 


40 


50 


CCTCAGATCA 


CTCTTTGGCA 


ACGACCCCTC 


GTCACAATAA 


AGATAGGGGG 


60 


70 


80 


90 


100 


GCAACTAAAG 


GAAGCTCTAT 


TAGATACAGG 


AGCAGATGAT 


ACAGTATTAG 


110 
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AAGAAATGAG 


TTTGCCAGGA 


AGATGGAAAC 


CAAAAATGAT 


AGGGGGAATT 


160 
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GGAGGTTTTA 


TCAAAGTAAG 


ACAGTATGAT 


CAGATACTCA 


TAGAAATCTG 
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TGGACATAAA 


GCTATAGGTA 


CAGTATTAGT 


AGGACCTACA 


CCTGTCAACA 


260 


270 


280 


290 




TAATTGGAAG 


AAATCTGTTG 


ACTCAGATTG 


GTTGCACTTT 


AAATTTT 
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5. A method for expressing a protease 
consisting essentially of inserting a recombinant 
vector containing a synthetic gene for a protease 
essential for infectivity of a retrovirus into a host 
cell ; 

expressing said gene; and 
separating said protease. 

6. The process of claim 5 wherein said 
retrovirus is a member of the group consisting of HIV- 
1 , HIV- 2 , and HTLV a Human Leukemia virus . 

7. The process of claim 5 wherein said 
retrovirus is HIV-1 and said gene has a nucleotide 
sequence of 
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TTTGCCAGGA 


AGATGGAAAC 


CAAAAATGAT 
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ACAGTATGAT 
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FIG. 2 

fragment 1 

5 ' CCTCAGATCACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAActaa 

fragment 2 

AGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAagat 
fragment 3 

GGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAgata 
fragment 4 

CTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTcaac 
fragment 5 

ATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTT 3 ' 
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